Overview

Dataset statistics

Number of variables18
Number of observations1000
Missing cells317
Missing cells (%)1.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory140.8 KiB
Average record size in memory144.1 B

Variable types

Categorical9
Numeric6
DateTime1
Boolean2

Alerts

deceased_indicator has constant value "False" Constant
country has constant value "Australia" Constant
first_name has a high cardinality: 940 distinct values High cardinality
last_name has a high cardinality: 961 distinct values High cardinality
job_title has a high cardinality: 184 distinct values High cardinality
address has a high cardinality: 1000 distinct values High cardinality
postcode is highly correlated with property_valuationHigh correlation
property_valuation is highly correlated with postcodeHigh correlation
Rank is highly correlated with ValueHigh correlation
Value is highly correlated with RankHigh correlation
Rank is highly correlated with ValueHigh correlation
Value is highly correlated with RankHigh correlation
Rank is highly correlated with ValueHigh correlation
Value is highly correlated with RankHigh correlation
gender is highly correlated with deceased_indicator and 1 other fieldsHigh correlation
job_industry_category is highly correlated with deceased_indicator and 1 other fieldsHigh correlation
owns_car is highly correlated with deceased_indicator and 1 other fieldsHigh correlation
state is highly correlated with deceased_indicator and 1 other fieldsHigh correlation
wealth_segment is highly correlated with deceased_indicator and 1 other fieldsHigh correlation
deceased_indicator is highly correlated with gender and 5 other fieldsHigh correlation
country is highly correlated with gender and 5 other fieldsHigh correlation
gender is highly correlated with job_industry_categoryHigh correlation
job_industry_category is highly correlated with genderHigh correlation
postcode is highly correlated with state and 1 other fieldsHigh correlation
state is highly correlated with postcodeHigh correlation
property_valuation is highly correlated with postcodeHigh correlation
Rank is highly correlated with ValueHigh correlation
Value is highly correlated with RankHigh correlation
last_name has 29 (2.9%) missing values Missing
DOB has 17 (1.7%) missing values Missing
job_title has 106 (10.6%) missing values Missing
job_industry_category has 165 (16.5%) missing values Missing
first_name is uniformly distributed Uniform
last_name is uniformly distributed Uniform
address is uniformly distributed Uniform
address has unique values Unique

Reproduction

Analysis started2022-03-21 20:50:13.551679
Analysis finished2022-03-21 20:50:31.889736
Duration18.34 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

first_name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct940
Distinct (%)94.0%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Rozamond
 
3
Mandie
 
3
Dorian
 
3
Muffin
 
2
Tessa
 
2
Other values (935)
987 

Length

Max length13
Median length6
Mean length6.087
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique883 ?
Unique (%)88.3%

Sample

1st rowChickie
2nd rowMorly
3rd rowArdelis
4th rowLucine
5th rowMelinda

Common Values

ValueCountFrequency (%)
Rozamond3
 
0.3%
Mandie3
 
0.3%
Dorian3
 
0.3%
Muffin2
 
0.2%
Tessa2
 
0.2%
Suzy2
 
0.2%
Farlie2
 
0.2%
Kippar2
 
0.2%
Maddalena2
 
0.2%
Nobe2
 
0.2%
Other values (930)977
97.7%

Length

2022-03-21T21:50:32.032272image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
rozamond3
 
0.3%
dorian3
 
0.3%
mandie3
 
0.3%
muffin2
 
0.2%
latrena2
 
0.2%
shane2
 
0.2%
anthony2
 
0.2%
barth2
 
0.2%
cami2
 
0.2%
aloysius2
 
0.2%
Other values (930)977
97.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

last_name
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct961
Distinct (%)99.0%
Missing29
Missing (%)2.9%
Memory size7.9 KiB
Borsi
 
2
Sissel
 
2
Shoesmith
 
2
Burgoine
 
2
Hallt
 
2
Other values (956)
961 

Length

Max length21
Median length7
Mean length7.026776519
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique951 ?
Unique (%)97.9%

Sample

1st rowBrister
2nd rowGenery
3rd rowForrester
4th rowStutt
5th rowHadlee

Common Values

ValueCountFrequency (%)
Borsi2
 
0.2%
Sissel2
 
0.2%
Shoesmith2
 
0.2%
Burgoine2
 
0.2%
Hallt2
 
0.2%
Minshall2
 
0.2%
Eade2
 
0.2%
Van den Velde2
 
0.2%
Sturch2
 
0.2%
Crellim2
 
0.2%
Other values (951)951
95.1%
(Missing)29
 
2.9%

Length

2022-03-21T21:50:32.228683image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
van3
 
0.3%
de3
 
0.3%
den3
 
0.3%
borsi2
 
0.2%
sissel2
 
0.2%
crellim2
 
0.2%
sturch2
 
0.2%
velde2
 
0.2%
eade2
 
0.2%
minshall2
 
0.2%
Other values (960)963
97.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

gender
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Female
513 
Male
470 
U
 
17

Length

Max length6
Median length6
Mean length4.975
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Female513
51.3%
Male470
47.0%
U17
 
1.7%

Length

2022-03-21T21:50:32.423791image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-21T21:50:32.541475image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
female513
51.3%
male470
47.0%
u17
 
1.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Distinct100
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.836
Minimum0
Maximum99
Zeros9
Zeros (%)0.9%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-03-21T21:50:32.705042image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q126.75
median51
Q372
95-th percentile94
Maximum99
Range99
Interquartile range (IQR)45.25

Descriptive statistics

Standard deviation27.79668613
Coefficient of variation (CV)0.5577631858
Kurtosis-1.088048884
Mean49.836
Median Absolute Deviation (MAD)22.5
Skewness-0.06562186172
Sum49836
Variance772.6557598
MonotonicityNot monotonic
2022-03-21T21:50:32.985934image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6020
 
2.0%
5918
 
1.8%
4217
 
1.7%
7017
 
1.7%
1116
 
1.6%
3716
 
1.6%
4715
 
1.5%
8414
 
1.4%
6714
 
1.4%
5714
 
1.4%
Other values (90)839
83.9%
ValueCountFrequency (%)
09
0.9%
18
0.8%
29
0.9%
39
0.9%
410
1.0%
513
1.3%
610
1.0%
713
1.3%
87
0.7%
95
 
0.5%
ValueCountFrequency (%)
999
0.9%
986
0.6%
9711
1.1%
969
0.9%
958
0.8%
9412
1.2%
939
0.9%
925
0.5%
918
0.8%
906
0.6%

DOB
Date

MISSING

Distinct958
Distinct (%)97.5%
Missing17
Missing (%)1.7%
Memory size7.9 KiB
Minimum1938-06-08 00:00:00
Maximum2002-02-27 00:00:00
2022-03-21T21:50:33.268697image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:33.516026image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

job_title
Categorical

HIGH CARDINALITY
MISSING

Distinct184
Distinct (%)20.6%
Missing106
Missing (%)10.6%
Memory size7.9 KiB
Associate Professor
 
15
Environmental Tech
 
14
Software Consultant
 
14
Chief Design Engineer
 
13
Cost Accountant
 
12
Other values (179)
826 

Length

Max length36
Median length18
Mean length18.08836689
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45 ?
Unique (%)5.0%

Sample

1st rowGeneral Manager
2nd rowStructural Engineer
3rd rowSenior Cost Accountant
4th rowAccount Representative III
5th rowFinancial Analyst

Common Values

ValueCountFrequency (%)
Associate Professor15
 
1.5%
Environmental Tech14
 
1.4%
Software Consultant14
 
1.4%
Chief Design Engineer13
 
1.3%
Cost Accountant12
 
1.2%
VP Sales12
 
1.2%
Assistant Manager12
 
1.2%
Assistant Media Planner12
 
1.2%
Senior Sales Associate12
 
1.2%
VP Quality Control11
 
1.1%
Other values (174)767
76.7%
(Missing)106
 
10.6%

Length

2022-03-21T21:50:33.790977image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
engineer131
 
6.3%
assistant82
 
3.9%
manager76
 
3.7%
analyst66
 
3.2%
iv52
 
2.5%
iii50
 
2.4%
vp46
 
2.2%
ii44
 
2.1%
senior44
 
2.1%
sales44
 
2.1%
Other values (117)1444
69.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

job_industry_category
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct9
Distinct (%)1.1%
Missing165
Missing (%)16.5%
Memory size7.9 KiB
Financial Services
203 
Manufacturing
199 
Health
152 
Retail
78 
Property
64 
Other values (4)
139 

Length

Max length18
Median length13
Mean length11.31976048
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowManufacturing
2nd rowProperty
3rd rowFinancial Services
4th rowManufacturing
5th rowFinancial Services

Common Values

ValueCountFrequency (%)
Financial Services203
20.3%
Manufacturing199
19.9%
Health152
15.2%
Retail78
 
7.8%
Property64
 
6.4%
IT51
 
5.1%
Entertainment37
 
3.7%
Argiculture26
 
2.6%
Telecommunications25
 
2.5%
(Missing)165
16.5%

Length

2022-03-21T21:50:34.002080image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-21T21:50:34.155674image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
financial203
19.6%
services203
19.6%
manufacturing199
19.2%
health152
14.6%
retail78
 
7.5%
property64
 
6.2%
it51
 
4.9%
entertainment37
 
3.6%
argiculture26
 
2.5%
telecommunications25
 
2.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

wealth_segment
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Mass Customer
508 
High Net Worth
251 
Affluent Customer
241 

Length

Max length17
Median length13
Mean length14.215
Min length13

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMass Customer
2nd rowMass Customer
3rd rowAffluent Customer
4th rowAffluent Customer
5th rowAffluent Customer

Common Values

ValueCountFrequency (%)
Mass Customer508
50.8%
High Net Worth251
25.1%
Affluent Customer241
24.1%

Length

2022-03-21T21:50:34.419811image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-21T21:50:34.559671image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
customer749
33.3%
mass508
22.6%
high251
 
11.2%
net251
 
11.2%
worth251
 
11.2%
affluent241
 
10.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

deceased_indicator
Boolean

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
1000 
ValueCountFrequency (%)
False1000
100.0%
2022-03-21T21:50:34.635267image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

owns_car
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
507 
True
493 
ValueCountFrequency (%)
False507
50.7%
True493
49.3%
2022-03-21T21:50:34.683105image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

tenure
Real number (ℝ≥0)

Distinct23
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.388
Minimum0
Maximum22
Zeros2
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-03-21T21:50:34.795621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q17
median11
Q315
95-th percentile20
Maximum22
Range22
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.037144908
Coefficient of variation (CV)0.442320417
Kurtosis-0.8128152156
Mean11.388
Median Absolute Deviation (MAD)4
Skewness0.07089079797
Sum11388
Variance25.37282883
MonotonicityNot monotonic
2022-03-21T21:50:35.047281image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
979
 
7.9%
1374
 
7.4%
1168
 
6.8%
1063
 
6.3%
1261
 
6.1%
560
 
6.0%
760
 
6.0%
1759
 
5.9%
1558
 
5.8%
855
 
5.5%
Other values (13)363
36.3%
ValueCountFrequency (%)
02
 
0.2%
18
 
0.8%
215
 
1.5%
326
 
2.6%
436
3.6%
560
6.0%
645
4.5%
760
6.0%
855
5.5%
979
7.9%
ValueCountFrequency (%)
2212
 
1.2%
2124
 
2.4%
2022
 
2.2%
1934
3.4%
1836
3.6%
1759
5.9%
1649
4.9%
1558
5.8%
1454
5.4%
1374
7.4%

address
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
0721 Meadow Ridge Pass
 
1
3 Golden Leaf Point
 
1
68 Anthes Park
 
1
15 Weeping Birch Crossing
 
1
45 Becker Place
 
1
Other values (995)
995 

Length

Max length26
Median length18
Mean length17.582
Min length9

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1000 ?
Unique (%)100.0%

Sample

1st row45 Shopko Center
2nd row14 Mccormick Park
3rd row5 Colorado Crossing
4th row207 Annamark Plaza
5th row115 Montana Place

Common Values

ValueCountFrequency (%)
0721 Meadow Ridge Pass1
 
0.1%
3 Golden Leaf Point1
 
0.1%
68 Anthes Park1
 
0.1%
15 Weeping Birch Crossing1
 
0.1%
45 Becker Place1
 
0.1%
1969 Melody Lane1
 
0.1%
2886 Buena Vista Terrace1
 
0.1%
7870 Stuart Crossing1
 
0.1%
2382 Anthes Crossing1
 
0.1%
51 Hooker Court1
 
0.1%
Other values (990)990
99.0%

Length

2022-03-21T21:50:35.315226image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
crossing59
 
1.9%
park59
 
1.9%
center58
 
1.9%
lane55
 
1.8%
street55
 
1.8%
avenue55
 
1.8%
point54
 
1.7%
hill51
 
1.6%
plaza50
 
1.6%
court49
 
1.6%
Other values (1137)2563
82.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

postcode
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct522
Distinct (%)52.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3019.227
Minimum2000
Maximum4879
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-03-21T21:50:35.582341image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2046
Q12209
median2800
Q33845.5
95-th percentile4508.05
Maximum4879
Range2879
Interquartile range (IQR)1636.5

Descriptive statistics

Standard deviation848.8957672
Coefficient of variation (CV)0.2811632803
Kurtosis-1.142498217
Mean3019.227
Median Absolute Deviation (MAD)635.5
Skewness0.4921079268
Sum3019227
Variance720624.0235
MonotonicityNot monotonic
2022-03-21T21:50:35.816297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21459
 
0.9%
22329
 
0.9%
21487
 
0.7%
30297
 
0.7%
39777
 
0.7%
42077
 
0.7%
27507
 
0.7%
21687
 
0.7%
20266
 
0.6%
25606
 
0.6%
Other values (512)928
92.8%
ValueCountFrequency (%)
20001
 
0.1%
20073
0.3%
20092
0.2%
20104
0.4%
20114
0.4%
20151
 
0.1%
20162
0.2%
20171
 
0.1%
20193
0.3%
20221
 
0.1%
ValueCountFrequency (%)
48791
 
0.1%
48521
 
0.1%
48182
0.2%
48172
0.2%
48143
0.3%
47441
 
0.1%
47402
0.2%
47201
 
0.1%
47171
 
0.1%
47101
 
0.1%

state
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
NSW
506 
VIC
266 
QLD
228 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowQLD
2nd rowNSW
3rd rowVIC
4th rowQLD
5th rowNSW

Common Values

ValueCountFrequency (%)
NSW506
50.6%
VIC266
26.6%
QLD228
22.8%

Length

2022-03-21T21:50:36.129219image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-21T21:50:36.221968image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
nsw506
50.6%
vic266
26.6%
qld228
22.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

country
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Australia
1000 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAustralia
2nd rowAustralia
3rd rowAustralia
4th rowAustralia
5th rowAustralia

Common Values

ValueCountFrequency (%)
Australia1000
100.0%

Length

2022-03-21T21:50:36.414532image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-03-21T21:50:36.669891image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
australia1000
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

property_valuation
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.397
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-03-21T21:50:36.777681image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q16
median8
Q39
95-th percentile11
Maximum12
Range11
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.758804452
Coefficient of variation (CV)0.3729626134
Kurtosis-0.3712799928
Mean7.397
Median Absolute Deviation (MAD)2
Skewness-0.5576112079
Sum7397
Variance7.611002002
MonotonicityNot monotonic
2022-03-21T21:50:36.928278image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
9173
17.3%
8162
16.2%
7138
13.8%
10116
11.6%
670
7.0%
1162
 
6.2%
557
 
5.7%
453
 
5.3%
351
 
5.1%
1246
 
4.6%
Other values (2)72
7.2%
ValueCountFrequency (%)
130
 
3.0%
242
 
4.2%
351
 
5.1%
453
 
5.3%
557
 
5.7%
670
7.0%
7138
13.8%
8162
16.2%
9173
17.3%
10116
11.6%
ValueCountFrequency (%)
1246
 
4.6%
1162
 
6.2%
10116
11.6%
9173
17.3%
8162
16.2%
7138
13.8%
670
7.0%
557
 
5.7%
453
 
5.3%
351
 
5.1%

Rank
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct324
Distinct (%)32.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean498.819
Minimum1
Maximum1000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-03-21T21:50:37.134935image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile50
Q1250
median500
Q3750.25
95-th percentile948.15
Maximum1000
Range999
Interquartile range (IQR)500.25

Descriptive statistics

Standard deviation288.8109971
Coefficient of variation (CV)0.5789895675
Kurtosis-1.200749808
Mean498.819
Median Absolute Deviation (MAD)250
Skewness0.001245859611
Sum498819
Variance83411.79203
MonotonicityIncreasing
2022-03-21T21:50:37.368341image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
76013
 
1.3%
25912
 
1.2%
4559
 
0.9%
1339
 
0.9%
3869
 
0.9%
9049
 
0.9%
7008
 
0.8%
3128
 
0.8%
8208
 
0.8%
5368
 
0.8%
Other values (314)907
90.7%
ValueCountFrequency (%)
13
0.3%
42
0.2%
62
0.2%
82
0.2%
102
0.2%
121
 
0.1%
131
 
0.1%
142
0.2%
161
 
0.1%
172
0.2%
ValueCountFrequency (%)
10001
 
0.1%
9973
0.3%
9961
 
0.1%
9942
 
0.2%
9931
 
0.1%
9885
0.5%
9871
 
0.1%
9852
 
0.2%
9832
 
0.2%
9794
0.4%

Value
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct324
Distinct (%)32.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8817140938
Minimum0.34
Maximum1.71875
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-03-21T21:50:37.615906image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0.34
5-th percentile0.45655625
Q10.64953125
median0.86
Q31.075
95-th percentile1.40625
Maximum1.71875
Range1.37875
Interquartile range (IQR)0.42546875

Descriptive statistics

Standard deviation0.293524508
Coefficient of variation (CV)0.3329021392
Kurtosis-0.4524719248
Mean0.8817140938
Median Absolute Deviation (MAD)0.213125
Skewness0.4299025249
Sum881.7140938
Variance0.08615663677
MonotonicityDecreasing
2022-03-21T21:50:37.813431image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.637513
 
1.3%
1.062512
 
1.2%
0.9456259
 
0.9%
0.89259
 
0.9%
0.59
 
0.9%
1.23759
 
0.9%
1.028
 
0.8%
0.68758
 
0.8%
0.5843758
 
0.8%
0.8258
 
0.8%
Other values (314)907
90.7%
ValueCountFrequency (%)
0.341
 
0.1%
0.3573
0.3%
0.3741
 
0.1%
0.38252
 
0.2%
0.3911
 
0.1%
0.39955
0.5%
0.41
 
0.1%
0.4082
 
0.2%
0.412
 
0.2%
0.41654
0.4%
ValueCountFrequency (%)
1.718753
0.3%
1.7031252
0.2%
1.6718752
0.2%
1.656252
0.2%
1.6406252
0.2%
1.6251
 
0.1%
1.6093751
 
0.1%
1.593752
0.2%
1.56251
 
0.1%
1.5468752
0.2%

Interactions

2022-03-21T21:50:28.476077image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:21.863805image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:23.622927image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:25.500958image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:27.093737image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:28.724414image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:22.118130image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:23.948049image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:25.804564image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:27.302387image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:29.036577image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:22.327284image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:24.337882image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:26.062268image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:27.549559image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:29.286463image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:22.573035image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:24.569297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:26.353964image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:27.747036image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:29.506135image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:22.912266image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:24.805032image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:26.577419image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-21T21:50:27.941508image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-03-21T21:50:38.033562image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-03-21T21:50:38.420714image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-03-21T21:50:38.738365image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-03-21T21:50:39.043584image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-03-21T21:50:39.426562image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-03-21T21:50:30.087073image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-03-21T21:50:30.596745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-03-21T21:50:31.178210image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-03-21T21:50:31.484452image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

first_namelast_namegenderpast_3_years_bike_related_purchasesDOBjob_titlejob_industry_categorywealth_segmentdeceased_indicatorowns_cartenureaddresspostcodestatecountryproperty_valuationRankValue
0ChickieBristerMale861957-07-12General ManagerManufacturingMass CustomerNYes1445 Shopko Center4500QLDAustralia611.718750
1MorlyGeneryMale691970-03-22Structural EngineerPropertyMass CustomerNNo1614 Mccormick Park2113NSWAustralia1111.718750
2ArdelisForresterFemale101974-08-28Senior Cost AccountantFinancial ServicesAffluent CustomerNNo105 Colorado Crossing3505VICAustralia511.718750
3LucineStuttFemale641979-01-28Account Representative IIIManufacturingAffluent CustomerNYes5207 Annamark Plaza4814QLDAustralia141.703125
4MelindaHadleeFemale341965-09-21Financial AnalystFinancial ServicesAffluent CustomerNNo19115 Montana Place2093NSWAustralia941.703125
5DruciBrandliFemale391951-04-29Assistant Media PlannerEntertainmentHigh Net WorthNYes2289105 Pearson Terrace4075QLDAustralia761.671875
6RutledgeHalltMale231976-10-06Compensation AnalystFinancial ServicesMass CustomerNNo87 Nevada Crossing2620NSWAustralia761.671875
7NancieVianFemale741972-12-27Human Resources Assistant IIRetailMass CustomerNYes1085 Carioca Point4814QLDAustralia581.656250
8DuffKarlowiczMale501972-04-28Speech PathologistManufacturingMass CustomerNYes5717 West Drive2200NSWAustralia1081.656250
9BarthelDocketMale721985-08-02Accounting Assistant IVITMass CustomerNYes1780 Scofield Junction4151QLDAustralia5101.640625

Last rows

first_namelast_namegenderpast_3_years_bike_related_purchasesDOBjob_titlejob_industry_categorywealth_segmentdeceased_indicatorowns_cartenureaddresspostcodestatecountryproperty_valuationRankValue
990JermaineBagshaweFemale601954-05-14Help Desk OperatorPropertyMass CustomerNYes9260 Briar Crest Drive4209QLDAustralia69880.3995
991BryanJachtymMale591974-05-15Automation Specialist IManufacturingMass CustomerNYes1556 Moland Crossing3356VICAustralia39880.3995
992RenieLaundonFemale321973-12-18Assistant Media PlannerEntertainmentMass CustomerNYes81 Shelley Pass4118QLDAustralia39930.3910
993WeidarEtheridgeMale381959-07-13Compensation AnalystFinancial ServicesMass CustomerNYes60535 Jay Point2422NSWAustralia49940.3825
994DathaFishburnFemale151990-07-02Office Assistant IVRetailMass CustomerNNo36 Caliangt Way3079VICAustralia129940.3825
995FerdinandRomanettiMale601959-10-07ParalegalFinancial ServicesAffluent CustomerNNo92 Sloan Way2200NSWAustralia79960.3740
996BurkWortleyMale222001-10-17Senior Sales AssociateHealthMass CustomerNNo604 Union Crossing2196NSWAustralia109970.3570
997MelloneyTembyFemale171954-10-05Budget/Accounting Analyst IVFinancial ServicesAffluent CustomerNYes1533475 Fair Oaks Junction4702QLDAustralia29970.3570
998DickieCubbiniMale301952-12-17Financial AdvisorFinancial ServicesMass CustomerNYes1957666 Victoria Way4215QLDAustralia29970.3570
999SylasDuffillMale561955-10-02Staff Accountant IVPropertyMass CustomerNYes1421875 Grover Drive2010NSWAustralia910000.3400